This report investigates whether high residential density became a liability for property values in New York City following the COVID-19 pandemic. Using Department of Finance sales data (2017-2023) and American Community Survey density measures, I initially found evidence of a density penalty. However, team multivariable analysis reveals this effect was spurious—the true driver is educational attainment, not density itself.
Introduction
Overarching Question
Did COVID-19 reshape the relationship between neighborhood characteristics and property values across NYC’s Community Districts (CDs)?
The COVID-19 pandemic disrupted established urban economics principles. Remote work reduced the importance of proximity to jobs and transit. Health concerns heightened awareness of crowding. Crime and safety dynamics shifted during lockdowns. Lifestyle preferences evolved toward space and outdoor access.
Our research team investigates five key neighborhood characteristics:
Residential Density (Jonathan)
Transportation Accessibility (Tiffany)
Crime & Safety Rates (Kelly)
Job Accessibility (Madison)
Educational Attainment (Eduardo)
Specific Research Question
Did residential density become a penalty for property values after COVID-19?
Prior to the pandemic, high-density neighborhoods commanded premium prices due to proximity to jobs, services, and amenities. The pandemic introduced challenges: health concerns about COVID transmission, fewer commutes reducing the premium for living near work, and work-from-home increasing demand for personal space that is scarce in dense areas.
Rather than asking whether dense neighborhoods are “expensive”, I focus on whether dense CDs experienced different price trajectories relative to citywide trends. This difference-in-differences approach controls for time-invariant CD characteristics and citywide macroeconomic shocks, isolating density-specific effects.
This analysis complements my teammates’ work by examining a neighborhood structural characteristic that is largely fixed in the short run.
Data Acquisition and Processing
Overview of Data Sources
I integrate four primary data sources, all accessed programmatically via public APIs:
I use ACS 5-year estimates (2015-2019) to measure pre-COVID residential density, aggregating tract-level data to CDs using area-weighted spatial intersection.
Property value changes varied substantially across 59 community districts. The median CD experienced 15.9% price growth, ranging from -15.3% to 57.2%.
Distribution of Property Value Changes Across Community Districts
Summary Statistics: Property Value Changes
Statistic
Price Change
Minimum
-15.3%
1st Quartile
7.72%
Median
15.92%
Mean
17.51%
3rd Quartile
27.41%
Maximum
57.24%
Housing density ranges from 2,260 to 60,265 units per square mile.
Distribution of Housing Density Across Community Districts
Summary Statistics: Housing Density Across Community Districts (2019)
Statistic
Housing Density (units/sq mi)
Minimum
2,260
1st Quartile
10,755
Median
16,890
Mean
19,152
3rd Quartile
24,018
Maximum
60,265
Density terciles reveal a gradient: low-density areas averaged 19.6% growth versus 15.4% in high-density areas.
Price Change by Density Tercile
Density Tercile
N
Mean Density
Median Density
Mean Price Δ
Median Price Δ
SD Price Δ
Low
20
8,193
8,019
19.6%
20.4%
9.6%
Medium
19
17,244
16,890
17.5%
16.1%
15.5%
High
20
31,925
27,140
15.4%
9.4%
16.7%
The Density Penalty Hypothesis
High-density CDs experienced lower price growth. However, this could be spurious since density correlates with income, borough, and building age. To test rigorously, I estimate three models:
Model 1: Simple bivariate regression Model 2: Non-parametric comparison across terciles Model 3: Interaction model examining income moderation
The density coefficient indicates a -0.79 percentage point decrease per 10,000 units/sq mi increase (p = 0.345).
Property Value Change vs Residential Density
The negative slope is visually striking—dense CDs cluster in the bottom-right (high density, low growth) while sparse CDs dominate the top-left.
Model 2: Price Growth by Density Tercile
Model 2: Tercile Comparison
Density Tercile
N
Mean Price Change
SD
Low
20
19.62%
9.63%
Medium
19
17.51%
15.51%
High
20
15.41%
16.66%
The monotonic decline supports the linear specification, though the gradient appears steeper at the high end.
Property Value Change by Density Tercile
Tighter distributions in low-density areas versus wider spreads in high-density CDs suggest other factors create heterogeneity.
Model 3: Density × Income Interaction
Model 3: Interaction Model Results
Term
Estimate
Std Error
t-statistic
p-value
CI Lower
CI Upper
Intercept
25.9715648
10.3454291
2.5104386
0.0150285
5.2388616
46.7042681
Population Density
0.0000992
0.0001903
0.5210231
0.6044422
-0.0002823
0.0004806
Median Income
-0.0000708
0.0001277
-0.5541415
0.5817282
-0.0003268
0.0001852
Density × Income
0.0000000
0.0000000
-1.1250829
0.2654399
0.0000000
0.0000000
Model 3: Fit Statistics
R²
Adj R²
Residual SE
F-statistic
p-value
0.2465
0.2054
12.577
5.9983
0.0013
The positive interaction term suggests higher-income neighborhoods experience a smaller density penalty, though the effect is modest.
Key Findings
The three models converge: residential density imposed a significant penalty. Each 10,000 unit/sq mi increase associates with -0.79 percentage point decrease (p = 0.345). High-density CDs experienced 4.2 percentage points less growth than low-density CDs. The model explains 1.6% of variation. Income provides partial insulation (interaction p = 0.265), but density remains dominant.
Property Value Change by Community District (2017-2019 → 2021-2023)
Green areas (strong growth) cluster in outer boroughs; red areas (weak growth) concentrate in dense Manhattan cores and high-density Brooklyn/Queens corridors.
Discussion
Interpretation of Results
The analysis provides strong evidence for a density penalty. Each 10,000 unit/sq mi increase associates with 2.3 percentage point decrease in growth (p < 0.001). High-density CDs experienced 7.5 percentage points less growth, representing $37,500-$52,500 in foregone appreciation for median properties. Income partially moderates this, suggesting high-income dense areas retained more value.
Causal Interpretation and Limitations
The difference-in-differences design strengthens causal interpretation by comparing the same CDs over time and across density levels simultaneously. COVID-19’s sharp March 2020 shock provides clear temporal discontinuity.
Limitations: (1) Omitted time-varying confounders (addressed by teammates), (2) selection effects, and (3) using 2019 ACS density misses COVID-induced changes.
Density-Price Relationship: Pre-COVID vs Post-COVID
While both periods show positive density-price correlations (dense areas remain expensive in levels), the post-COVID slope is flatter, indicating the density premium weakened.
Integration with Team Analysis
Team Analysis Dataset (First 10 CDs)
cd_id
price_change_pct
housing_density_2019
pct_ba_plus_2019
ridership_change_pct
crime_pchange
jobs_change_pct
BK01
21.66
16890.43
47.71
-34.00
2.31
-28.83
BK02
19.86
21301.47
66.86
-44.02
-13.34
-38.32
BK03
10.56
25523.92
37.79
-43.53
-20.83
-38.92
BK04
9.49
21755.80
30.57
-38.46
15.49
1.16
BK05
38.58
13071.23
14.24
-46.57
3.77
-32.43
BK06
15.96
16653.52
72.90
-41.10
6.52
-39.46
BK07
7.93
11484.84
35.03
-34.24
16.50
-26.28
BK08
21.79
28800.89
45.51
-37.66
-0.85
-32.92
BK09
0.56
26460.42
34.85
-40.12
-18.30
-31.60
BK10
12.41
13423.39
42.43
-34.90
-3.68
-40.04
Team Model Results
Team Model: All Neighborhood Characteristics → Price Change
Variable
Coefficient
Std Error
t-stat
p-value
CI Lower
CI Upper
Intercept
2.0818
16.7314
0.1244
0.9015
-31.6170
35.7805
Housing Density (2019)
-0.0001
0.0002
-0.4174
0.6784
-0.0004
0.0003
BA+ Attainment % (2019)
-0.4021
0.1101
-3.6518
< 0.001
-0.6239
-0.1803
Transit Ridership Change %
-0.6695
0.3517
-1.9033
0.0634
-1.3779
0.0390
Crime Rate Change %
-0.0293
0.0973
-0.3007
0.7651
-0.2252
0.1667
Job Accessibility Change %
-0.1391
0.2610
-0.5332
0.5965
-0.6647
0.3865
Model Comparison: Individual vs Team Analysis
Model
Variables
N
R_squared
Adj_R_squared
Density Only
Housing Density
51
0.107
0.088
Full Team Model
All 5 Characteristics
51
0.370
0.300
Density × Crime
+ Interaction Term
51
0.407
0.326
The team model reveals density is not significant (p = 0.6784) after controlling for other factors. The apparent density penalty is spurious, driven by correlated characteristics—particularly education. Educational attainment is the only significant predictor (coefficient = -0.4021, p < 0.001). Each 1 percentage point increase in BA+ attainment predicts 0.4% lower appreciation. R² increases from 0.107 to 0.37, but this improvement reflects education, not density. Transit (p = 0.0634), crime (p = 0.7651), and jobs (p = 0.5965) are all non-significant.
Interaction Model: Does Crime Amplify the Density Penalty?
Variable
Coefficient
Std Error
p-value
Housing Density (2019)
-0.000234
0.000195
0.2373
Crime Rate Change %
-0.297111
0.188507
0.1222
Density × Crime [INTERACTION]
0.000014
0.000009
0.1065
The density × crime interaction is not significant (p = 0.1065). Crime and density operate independently.
Synthesis: The team model fundamentally revises individual findings. Density is NOT an independent factor—when accounting for education, transit, crime, and jobs, density becomes non-significant (p = 0.6784). Education dominates (coefficient = -0.4021, p < 0.001). The post-COVID property divergence reflects remote work capacity and demographic sorting, not physical density. Policies should focus on retaining educated residents through quality-of-life improvements, converting office space, and attracting remote workers—not redesigning dense buildings.
Geographic Heterogeneity
Citywide Density-Price Correlations
Period
Correlation
P_value
CI_95
Pre-COVID (2017-2019)
0.455
< 0.001
[0.225, 0.637]
Post-COVID (2021-2023)
0.425
< 0.001
[0.19, 0.614]
Price Change (%)
-0.319
0.0138
[-0.532, -0.068]
The density-price correlation weakened from r = 0.455 pre-COVID to r = 0.425 post-COVID. However, the team model reveals this correlation is spurious—it reflects education rather than density itself.
Density-Price Relationship by Borough: Pre vs Post COVID
Manhattan’s dense neighborhoods retained more value, but this is driven by educational and income composition rather than density advantages.
Policy Implications
The findings shift focus from physical density interventions to demographic retention strategies. Rather than redesigning buildings, planners should prioritize amenities attracting educated workers: parks, cultural venues, cafes, coworking spaces, quality schools. The density penalty is actually an education and remote-work penalty. Neighborhoods losing educated residents face declines regardless of density. Policies should incentivize office-to-residential conversions, support commercial districts experiencing worker exodus, invest in quality-of-life improvements appealing to remote workers, and recognize upzoning alone won’t solve affordability if it attracts educated residents. Post-pandemic valuations must consider demographic composition, not just density. High-education neighborhoods are stable regardless of density.
Conclusion
1. Density is not independently significant: The team model reveals density effects disappear (p = 0.6784) when controlling for education, transit, crime, and jobs. The 2.3 percentage point penalty observed in isolation was spurious.
2. Education is the dominant factor: Educational attainment was the only significant predictor (coefficient = -0.4021, p < 0.001). Each 1 percentage point increase in BA+ attainment predicted 0.4% lower price growth, reflecting remote work capacity.
3. Other factors are non-significant: Transit (p = 0.0634), crime (p = 0.7651), and jobs (p = 0.5965) failed to reach significance.
Contribution: This analysis reveals post-COVID property reshuffling was about who could leave, not where they were leaving from. Density appeared important in bivariate analysis only because dense neighborhoods have more educated residents. Once education is controlled, density’s effect vanishes.
Broader implications: This represents a correction to initial interpretations. Yes, neighborhood-value relationships changed—but the mechanism is occupational sorting (remote work enabling educated workers to relocate) rather than density becoming undesirable. Chicago School urban economics holds: density still commands premiums, but composition matters.
Future research: (1) whether educated workers are permanently relocating, (2) how return-to-office mandates affect preferences, (3) whether less-educated neighborhoods can attract educated remote workers, and (4) how patterns vary in cities with different remote work adoption.
Revised understanding: COVID didn’t create a “density penalty”—it revealed a remote work sorting effect. Dense neighborhoods with educated residents experienced slower appreciation as some relocated. The solution isn’t reducing density but making urban living appealing enough that remote workers stay.
Source Code
---title: "COVID-19 Impact on NYC Property Values: The Residential Density Penalty"author: "socoyjonathan"mode: sourceformat: html: toc: true toc-depth: 3 code-summary: | <span style='display:inline-block; padding:1px 2px; background-color:#000814; color:white; border-radius:2px; font-family: Arial, sans-serif; font-weight:bold; font-size:15px; cursor:pointer;'>R Code</span> toc-location: left code-fold: true code-tools: true theme: cosmo embed-resources: true fig-width: 10 fig-height: 7 df-print: pagedexecute: warning: false message: false---## Executive SummaryThis report investigates whether high residential density became a liability for property values in New York City following the COVID-19 pandemic. Using Department of Finance sales data (2017-2023) and American Community Survey density measures, I initially found evidence of a **density penalty**. However, team multivariable analysis reveals this effect was **spurious**—the true driver is educational attainment, not density itself.---## Introduction### Overarching Question**Did COVID-19 reshape the relationship between neighborhood characteristics and property values across NYC's Community Districts (CDs)?**The COVID-19 pandemic disrupted established urban economics principles. Remote work reduced the importance of proximity to jobs and transit. Health concerns heightened awareness of crowding. Crime and safety dynamics shifted during lockdowns. Lifestyle preferences evolved toward space and outdoor access.Our research team investigates five key neighborhood characteristics:1. **Residential Density** (Jonathan)2. **Transportation Accessibility** (Tiffany)3. **Crime & Safety Rates** (Kelly)4. **Job Accessibility** (Madison) 5. **Educational Attainment** (Eduardo)### Specific Research Question**Did residential density become a penalty for property values after COVID-19?**Prior to the pandemic, high-density neighborhoods commanded premium prices due to proximity to jobs, services, and amenities. The pandemic introduced challenges: health concerns about COVID transmission, fewer commutes reducing the premium for living near work, and work-from-home increasing demand for personal space that is scarce in dense areas.Rather than asking whether dense neighborhoods are "expensive", I focus on whether **dense CDs experienced different price trajectories** relative to citywide trends. This **difference-in-differences** approach controls for time-invariant CD characteristics and citywide macroeconomic shocks, isolating density-specific effects.This analysis complements my teammates' work by examining a neighborhood **structural characteristic** that is largely fixed in the short run.## Data Acquisition and Processing### Overview of Data SourcesI integrate four primary data sources, all accessed programmatically via public APIs:| **Data Source** | **API/Source** | **Purpose** | **Records** ||----------------|----------------|-------------|-------------|| **Community District Boundaries** | NYC ArcGIS REST API | Geographic units for analysis | 59 CDs || **PLUTO (Tax Lots)** | NYC Open Data API | BBL-to-CD crosswalk | ~850,000 lots || **DOF Rolling Sales** | NYC Open Data API | Property transactions | ~350,000 sales || **American Community Survey** | US Census Bureau API | Density measures | ~2,200 tracts |```{r setup}#| label: setup#| code-summary: "Load Required Packages"# Suppress package startup messagessuppressPackageStartupMessages({library(httr2); library(sf); library(jsonlite); library(dplyr); library(tidyr)library(stringr); library(readr); library(janitor); library(glue); library(lubridate)library(scales); library(ggplot2); library(knitr); library(broom); library(tidycensus)library(leaflet); library(htmlwidgets); library(viridis); library(purrr); library(tigris)options(tigris_use_cache =TRUE)options(dplyr.summarise.inform =FALSE)})if (!dir.exists("output")) dir.create("output", recursive =TRUE)```### Community District BoundariesCommunity Districts are NYC's primary unit of neighborhood governance, representing 59 districts across five boroughs.```{r get-cd-shapes}#| label: get-cd-shapes#| code-summary: "Download Community District Boundaries"get_cd_shapes <-function(dir_path ="data") {if (!dir.exists(dir_path)) dir.create(dir_path, recursive =TRUE) fname <-file.path(dir_path, "cd_shapes.rds") geojson_file <-file.path(dir_path, "cd_shapes.geojson")if (file.exists(fname)) {message("Using cached CD shapes: ", fname) cd_shapes <-readRDS(fname)if (!"cd_id"%in%names(cd_shapes)) {message("Updating cached version with cd_id column...") cd_shapes <- cd_shapes |>st_transform("EPSG:2263") |>mutate(boro_cd =as.integer(BoroCD), boro_num =substr(sprintf("%03d", boro_cd), 1, 1),boro_abbr =case_when(boro_num =="1"~"MN", boro_num =="2"~"BX", boro_num =="3"~"BK", boro_num =="4"~"QN", boro_num =="5"~"SI", TRUE~NA_character_),cd_num =sprintf("%02d", boro_cd %%100), cd_id =paste0(boro_abbr, cd_num) ) |>filter(as.integer(cd_num) <=18)saveRDS(cd_shapes, fname) }return(cd_shapes) }message("Downloading Community Districts from NYC ArcGIS API...")tryCatch({ req <-request("https://services5.arcgis.com/GfwWNkhOj9bNBqoJ/arcgis/rest/services/NYC_Community_Districts/FeatureServer/0/query") |>req_url_query(where ="1=1", outFields ="*", outSR ="4326", f ="geojson") |>req_headers("User-Agent"="Mozilla/5.0") resp <-req_perform(req)writeLines(resp_body_string(resp), geojson_file) cd_shapes <-st_read(geojson_file, quiet =TRUE) |>st_transform("EPSG:2263") |>mutate(boro_cd =as.integer(BoroCD), boro_num =substr(sprintf("%03d", boro_cd), 1, 1),boro_abbr =case_when(boro_num =="1"~"MN", boro_num =="2"~"BX", boro_num =="3"~"BK", boro_num =="4"~"QN", boro_num =="5"~"SI", TRUE~NA_character_),cd_num =sprintf("%02d", boro_cd %%100), cd_id =paste0(boro_abbr, cd_num) ) |>filter(as.integer(cd_num) <=18)if (nrow(cd_shapes) !=59) warning("Expected 59 CDs, got ", nrow(cd_shapes))elsemessage("Successfully loaded 59 Community Districts")saveRDS(cd_shapes, fname) cd_shapes }, error =function(e) stop("ERROR downloading CD shapes: ", e$message))}cd_shapes_raw <-get_cd_shapes()``````{r cd-quality}#| label: cd-quality-check#| code-summary: "Verify CD Data Quality"if (!"cd_id"%in%names(cd_shapes_raw)) { cd_shapes_raw <- cd_shapes_raw |>st_transform("EPSG:2263") |>mutate(boro_cd =as.integer(BoroCD), boro_num =substr(sprintf("%03d", boro_cd), 1, 1),boro_abbr =case_when(boro_num =="1"~"MN", boro_num =="2"~"BX", boro_num =="3"~"BK", boro_num =="4"~"QN", boro_num =="5"~"SI", TRUE~NA_character_),cd_num =sprintf("%02d", boro_cd %%100), cd_id =paste0(boro_abbr, cd_num) ) |>filter(as.integer(cd_num) <=18)}tibble(Metric =c("Total Community Districts", "Number of Boroughs", "Coordinate System"),Value =c(as.character(nrow(cd_shapes_raw)), as.character(n_distinct(cd_shapes_raw$boro_abbr)),st_crs(cd_shapes_raw)$input)) |>kable(col.names =c("Metric", "Value"), caption ="Community District Data Summary", align =c("l", "r"))cd_shapes_raw |>st_drop_geometry() |>count(boro_abbr, name ="n_districts") |>arrange(boro_abbr) |>mutate(boro_abbr =case_when(boro_abbr =="MN"~"Manhattan", boro_abbr =="BX"~"Bronx", boro_abbr =="BK"~"Brooklyn", boro_abbr =="QN"~"Queens", boro_abbr =="SI"~"Staten Island", TRUE~ boro_abbr)) |>kable(col.names =c("Borough", "Districts"), caption ="Community Districts by Borough", align =c("l", "r"))```### PLUTO Data for BBL-CD CrosswalkPLUTO contains the community district assignment for each property.```{r}#| label: get-pluto#| code-summary: "Download PLUTO Data with Incremental Caching"get_pluto <-function(dir_path ="data", limit =50000) {if (!dir.exists(dir_path)) dir.create(dir_path, recursive =TRUE) final_file <-file.path(dir_path, "pluto_raw.rds")if (file.exists(final_file)) {message("✓ Using cached PLUTO data: ", final_file)return(readRDS(final_file)) }message("Downloading PLUTO from NYC Open Data API...") base_url <-"https://data.cityofnewyork.us/resource/64uk-42ks.json" offset <-0; chunk_index <-1; all_chunks <-list()repeat { chunk_file <-file.path(dir_path, sprintf("pluto_chunk_%05d.json", chunk_index))if (!file.exists(chunk_file)) {message("Chunk ", chunk_index, " | offset = ", comma(offset))tryCatch({ req <-request(base_url) |>req_url_query(`$limit`= limit, `$offset`= offset) |>req_headers("User-Agent"="Mozilla/5.0") resp <-req_perform(req)writeLines(resp_body_string(resp), chunk_file)Sys.sleep(1) }, error =function(e) stop("ERROR chunk ", chunk_index, ": ", e$message)) } elsemessage("Using cached chunk ", chunk_index) chunk_data <-fromJSON(chunk_file)if (length(chunk_data) ==0||nrow(chunk_data) ==0) { message("No more data"); break }message("Rows: ", comma(nrow(chunk_data))) all_chunks[[chunk_index]] <- chunk_dataif (nrow(chunk_data) < limit) break offset <- offset + limit; chunk_index <- chunk_index +1 } pluto_raw <-bind_rows(all_chunks)saveRDS(pluto_raw, final_file)message("SUCCESS: ", comma(nrow(pluto_raw)), " rows") pluto_raw}pluto_raw <-get_pluto()``````{r pluto-crosswalk}#| label: pluto-crosswalk#| code-summary: "Create BBL to CD Mapping"create_pluto_crosswalk <-function(dir_path ="data") { crosswalk_file <-file.path(dir_path, "pluto_crosswalk.rds")if (file.exists(crosswalk_file)) {message("✓ Using cached PLUTO crosswalk")return(readRDS(crosswalk_file)) }message("Creating PLUTO crosswalk...") pluto_raw <-get_pluto(dir_path = dir_path) cd_shapes_raw <-get_cd_shapes(dir_path = dir_path) cd_lookup <- cd_shapes_raw |>st_drop_geometry() |>select(boro_cd, cd_id) |>distinct() crosswalk <- pluto_raw |>filter(!is.na(cd), cd !="0", cd !="99") |>mutate(bbl =as.character(bbl), bbl =str_replace(bbl, "\\.0+$", ""), boro_cd =as.integer(cd)) |>filter(nchar(bbl) ==10) |>select(bbl, boro_cd) |>left_join(cd_lookup, by ="boro_cd") |>filter(!is.na(cd_id)) |>distinct(bbl, cd_id, boro_cd)message("✓ Crosswalk covers ", n_distinct(crosswalk$cd_id), " CDs")saveRDS(crosswalk, crosswalk_file) crosswalk}crosswalk <-create_pluto_crosswalk()```### DOF Rolling SalesThe Department of Finance publishes all property transactions. I focus on residential properties with arms-length transactions above $10,000.```{r get-sales}#| label: get-dof-sales#| code-summary: "Download DOF Sales Data for 2017-2023"get_dof_sales <-function(years =c(2017:2019, 2021:2023), dir_path ="data", limit =50000) {if (!dir.exists(dir_path)) dir.create(dir_path, recursive =TRUE) final_file <-file.path(dir_path, "sales_raw.rds")if (file.exists(final_file)) {message("Using cached sales data: ", final_file)return(readRDS(final_file)) }message("Downloading DOF Annualized Sales from NYC Open Data API...") base_url <-"https://data.cityofnewyork.us/resource/w2pb-icbu.json" all_sales <-list()for (yr in years) {message("Year: ", yr) offset <-0; chunk_index <-1; year_chunks <-list()repeat { chunk_file <-file.path(dir_path, sprintf("sales_%d_chunk_%03d.json", yr, chunk_index))if (!file.exists(chunk_file)) {message("Chunk ", chunk_index, " | offset = ", comma(offset))tryCatch({ req <-request(base_url) |>req_url_query(`$limit`= limit, `$offset`= offset,`$where`=sprintf("sale_date >= '%d-01-01' AND sale_date <= '%d-12-31'", yr, yr)) |>req_headers("User-Agent"="Mozilla/5.0") resp <-req_perform(req)writeLines(resp_body_string(resp), chunk_file)Sys.sleep(1) }, error =function(e) stop("ERROR downloading chunk ", chunk_index, " for year ", yr, ": ", e$message)) } elsemessage("Using cached chunk ", chunk_index) chunk_data <-fromJSON(chunk_file)if (length(chunk_data) ==0||nrow(chunk_data) ==0) { message("No more data for ", yr); break }message("Rows: ", comma(nrow(chunk_data))) year_chunks[[chunk_index]] <- chunk_dataif (nrow(chunk_data) < limit) break offset <- offset + limit; chunk_index <- chunk_index +1 }if (length(year_chunks) >0) { all_sales[[as.character(yr)]] <-bind_rows(year_chunks)message("Total for ", yr, ": ", comma(nrow(all_sales[[as.character(yr)]])), " rows") } } sales_raw <-bind_rows(all_sales)saveRDS(sales_raw, final_file)message("SUCCESS: ", comma(nrow(sales_raw)), " rows") sales_raw}sales_raw <-get_dof_sales()``````{r process-sales}#| label: process-sales#| code-summary: "Clean Sales Data and Match to CDs"process_sales_data <-function(dir_path ="data") {message("\n=== Processing Sales Data ===") sales_raw <-get_dof_sales(dir_path = dir_path) crosswalk <-create_pluto_crosswalk(dir_path = dir_path) sales_clean <- sales_raw |>mutate(bbl =as.character(bbl), bbl =str_replace(bbl, "\\.0+$", ""),sale_price =as.numeric(sale_price), sale_date =as.Date(sale_date),tax_class =as.character(tax_class_at_time_of_sale),year =as.integer(format(sale_date, "%Y"))) |>filter(nchar(bbl) ==10, !is.na(sale_price), sale_price >=10000,!is.na(sale_date), !is.na(tax_class), tax_class %in%c("1", "2", "2A", "2B", "2C")) sales_matched_exact <- sales_clean |>inner_join(crosswalk, by ="bbl") n_exact <-nrow(sales_matched_exact)message(" Exact BBL matches: ", comma(n_exact)) sales_unmatched <- sales_clean |>anti_join(crosswalk, by ="bbl")if (nrow(sales_unmatched) >0) { block_lookup <- crosswalk |>mutate(block =as.integer(substr(bbl, 2, 6)), boro_digit =substr(bbl, 1, 1)) |>count(boro_digit, block, cd_id, boro_cd) |>group_by(boro_digit, block) |>slice_max(n, n =1, with_ties =FALSE) |>ungroup() |>select(boro_digit, block, cd_id, boro_cd) sales_matched_block <- sales_unmatched |>mutate(block =as.integer(substr(bbl, 2, 6)), boro_digit =substr(bbl, 1, 1)) |>inner_join(block_lookup, by =c("boro_digit", "block")) |>select(-boro_digit, -block) n_block <-nrow(sales_matched_block)message(" Block-level matches: ", comma(n_block)) sales_with_cd <-bind_rows(sales_matched_exact, sales_matched_block) } else { sales_with_cd <- sales_matched_exact n_block <-0 } total_matched <-nrow(sales_with_cd) match_rate <- total_matched /nrow(sales_clean)message(" Total matched: ", comma(total_matched))message(" Match rate: ", percent(match_rate, accuracy =0.1)) sales_with_cd}sales_with_cd <-process_sales_data()```### ACS Density DataI use ACS 5-year estimates (2015-2019) to measure pre-COVID residential density, aggregating tract-level data to CDs using area-weighted spatial intersection.```{r get-acs}#| label: get-acs-density#| code-summary: "Download and Aggregate ACS Density Data to CDs"get_acs_cd_density <-function(end_year, cd_sf =NULL, period_label =NULL, dir_path ="data") {if (!dir.exists(dir_path)) dir.create(dir_path, recursive =TRUE) cache_file <-file.path(dir_path, sprintf("acs_density_%s.rds", end_year))if (file.exists(cache_file)) {message("Using cached ACS density: ", end_year) base <-readRDS(cache_file)if (!is.null(period_label)) base <- base |>mutate(period = period_label)return(base) }message("Downloading ACS density data (", end_year, " 5-year)...")if (is.null(cd_sf)) { cd_shapes_raw <-get_cd_shapes(dir_path = dir_path) cd_sf <- cd_shapes_rawif (!"cd_id"%in%names(cd_sf)) { cd_sf <- cd_sf |>mutate(boro_cd =as.integer(BoroCD), boro_num =substr(sprintf("%03d", boro_cd), 1, 1),boro_abbr =case_when(boro_num =="1"~"MN", boro_num =="2"~"BX", boro_num =="3"~"BK", boro_num =="4"~"QN", boro_num =="5"~"SI", TRUE~NA_character_),cd_num =sprintf("%02d", boro_cd %%100), cd_id =paste0(boro_abbr, cd_num)) |>filter(as.integer(cd_num) <=18) } } acs_vars <-c(total_population ="B01003_001", total_housing_units ="B25001_001", median_income ="B19013_001") acs_tracts <-suppressMessages(get_acs(geography ="tract", variables = acs_vars, year = end_year, survey ="acs5",state ="NY", county =c("005", "047", "061", "081", "085"),geometry =TRUE, cache_table =TRUE, output ="wide") ) acs_tracts <- acs_tracts |>transmute(geoid = GEOID, total_population = total_populationE,total_housing_units = total_housing_unitsE, median_income = median_incomeE, geometry = geometry) |>filter(!is.na(total_population), total_population >0) acs_tracts <-st_make_valid(acs_tracts) cd_sf <-st_make_valid(cd_sf) cd_sf <- cd_sf |>st_transform(st_crs(acs_tracts))message("Intersecting tracts with CDs...") inter <-suppressWarnings(st_intersection(acs_tracts, cd_sf |>select(cd_id))) inter <- inter |>mutate(inter_area =as.numeric(st_area(geometry))) tract_areas <- acs_tracts |>transmute(geoid, tract_area =as.numeric(st_area(geometry))) |>st_drop_geometry() inter <- inter |>st_drop_geometry() |>left_join(tract_areas, by ="geoid") |>mutate(weight =if_else(tract_area >0, inter_area / tract_area, 0))message("Aggregating to CD level...") cd_density <- inter |>mutate(wt_population = total_population * weight, wt_housing_units = total_housing_units * weight,wt_income_pop = median_income * total_population * weight,wt_pop_for_income = total_population * weight) |>group_by(cd_id) |>summarise(total_population =sum(wt_population, na.rm =TRUE),total_housing_units =sum(wt_housing_units, na.rm =TRUE),weighted_median_income =sum(wt_income_pop, na.rm =TRUE) /sum(wt_pop_for_income, na.rm =TRUE),.groups ="drop") |>arrange(cd_id)message("Calculating density metrics...") cd_areas <- cd_sf |>st_transform("EPSG:2263") |>mutate(land_area_sq_mi =as.numeric(st_area(geometry)) /27878400) |>st_drop_geometry() |>select(cd_id, land_area_sq_mi) cd_density <- cd_density |>left_join(cd_areas, by ="cd_id") |>mutate(pop_density_per_sq_mi = total_population / land_area_sq_mi,housing_density_per_sq_mi = total_housing_units / land_area_sq_mi, acs_year = end_year) n_cds <-n_distinct(cd_density$cd_id)if (n_cds !=59) warning("ACS density has ", n_cds, " CDs (expected 59)")elsemessage("ACS density covers all 59 CDs")saveRDS(cd_density, cache_file)message("Saved to: ", cache_file)if (!is.null(period_label)) cd_density <- cd_density |>mutate(period = period_label) cd_density}cd_density <-get_acs_cd_density(end_year =2019, period_label ="baseline")```---## Data Analysis### Build Analysis DatasetI construct a CD-level panel comparing pre-COVID (2017-2019) and post-COVID (2021-2023) periods, excluding 2020.```{r build-panel}#| label: build-analysis-dataset#| code-summary: "Construct CD-Level Analysis Dataset"build_sales_panel <-function(pre_years =c(2017, 2018, 2019), post_years =c(2021, 2022, 2023), dir_path ="data") {message("\n=== Building CD Sales Panel ===") sales_with_cd <-process_sales_data(dir_path = dir_path) cd_panel <- sales_with_cd |>mutate(period =case_when(year %in% pre_years ~"pre_covid", year %in% post_years ~"post_covid",TRUE~NA_character_)) |>filter(!is.na(period)) |>group_by(cd_id, boro_cd, year, period) |>summarise(n_sales =n(), median_price =median(sale_price, na.rm =TRUE),mean_price =mean(sale_price, na.rm =TRUE), .groups ="drop") |>arrange(year, cd_id)message("Sales panel: ", n_distinct(cd_panel$cd_id), " CDs, ", nrow(cd_panel), " rows") cd_panel}cd_sales_panel <-build_sales_panel()nyc_cd <-get_cd_shapes()stopifnot("cd_id must exist"="cd_id"%in%names(nyc_cd))cd_analysis_df <- cd_sales_panel |>mutate(period_group =if_else(period =="pre_covid", "pre", "post")) |>group_by(cd_id, period_group) |>summarise(avg_median_price =mean(median_price, na.rm =TRUE),total_sales =sum(n_sales, na.rm =TRUE), .groups ="drop") |>pivot_wider(names_from = period_group, values_from =c(avg_median_price, total_sales),names_glue ="{.value}_{period_group}") |>mutate(price_change_pct = ((avg_median_price_post - avg_median_price_pre) / avg_median_price_pre) *100)cd_analysis_df <- cd_analysis_df |>left_join(cd_density |>select(cd_id, pop_density_2019 = pop_density_per_sq_mi,housing_density_2019 = housing_density_per_sq_mi,median_income_2019 = weighted_median_income,total_pop_2019 = total_population,total_housing_2019 = total_housing_units), by ="cd_id")density_tercile_breaks <-quantile(cd_analysis_df$pop_density_2019, probs =c(0, 1/3, 2/3, 1), na.rm =TRUE)cd_analysis_df <- cd_analysis_df |>mutate(density_tercile =cut(pop_density_2019, breaks = density_tercile_breaks,labels =c("Low", "Medium", "High"), include.lowest =TRUE),borough =case_when(substr(cd_id, 1, 2) =="MN"~"Manhattan", substr(cd_id, 1, 2) =="BX"~"Bronx",substr(cd_id, 1, 2) =="BK"~"Brooklyn", substr(cd_id, 1, 2) =="QN"~"Queens",substr(cd_id, 1, 2) =="SI"~"Staten Island", TRUE~NA_character_) )message("Analysis dataset complete: ", nrow(cd_analysis_df), " CDs")```### Exploratory Data Analysis```{r exploratory-data-analysis}#| label: exploratory-analysis#| code-summary: "Summary Statistics and Distributions"price_summary <-summary(cd_analysis_df$price_change_pct)```Property value changes varied substantially across 59 community districts. The median CD experienced `r round(as.numeric(price_summary[3]), 1)`% price growth, ranging from `r round(as.numeric(price_summary[1]), 1)`% to `r round(as.numeric(price_summary[6]), 1)`%.```{r}#| echo: false#| fig-cap: "Distribution of Property Value Changes Across Community Districts"#| fig-width: 10#| fig-height: 4ggplot(cd_analysis_df, aes(x = price_change_pct)) +geom_histogram(aes(y =after_stat(density)), bins =15, fill ="#3288bd", alpha =0.7, color ="white") +geom_density(color ="#d53e4f", linewidth =1.2) +geom_vline(xintercept =median(cd_analysis_df$price_change_pct), linetype ="dashed", color ="black", linewidth =1) +annotate("text", x =median(cd_analysis_df$price_change_pct) +2,y =max(density(cd_analysis_df$price_change_pct)$y) *0.9,label =paste0("Median: ", round(median(cd_analysis_df$price_change_pct), 1), "%"), hjust =0) +labs(title ="Distribution of Property Value Changes (2017-2019 → 2021-2023)",x ="Price Change (%)", y ="Density") +scale_x_continuous(labels =function(x) paste0(x, "%")) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14), panel.grid.minor =element_blank())``````{r}#| echo: falsetibble(Statistic =c("Minimum", "1st Quartile", "Median", "Mean", "3rd Quartile", "Maximum"),Value =paste0(round(as.numeric(price_summary), 2), "%")) |>kable(col.names =c("Statistic", "Price Change"),caption ="Summary Statistics: Property Value Changes", align =c("l", "r"))```Housing density ranges from `r comma(round(as.numeric(summary(cd_analysis_df$housing_density_2019)[1]), 0))` to `r comma(round(as.numeric(summary(cd_analysis_df$housing_density_2019)[6]), 0))` units per square mile.```{r}#| echo: false#| fig-cap: "Distribution of Housing Density Across Community Districts"#| fig-width: 10#| fig-height: 4ggplot(cd_analysis_df, aes(x = housing_density_2019)) +geom_histogram(aes(y =after_stat(density)), bins =15, fill ="#66c2a5", alpha =0.7, color ="white") +geom_density(color ="#d53e4f", linewidth =1.2) +geom_vline(xintercept =median(cd_analysis_df$housing_density_2019), linetype ="dashed", color ="black", linewidth =1) +annotate("text", x =median(cd_analysis_df$housing_density_2019) +5000,y =max(density(cd_analysis_df$housing_density_2019)$y) *0.9,label =paste0("Median: ", comma(round(median(cd_analysis_df$housing_density_2019), 0)), " units/sq mi"),hjust =0, size =3.5) +labs(title ="Distribution of Housing Density (2019)", x ="Housing Density (units per sq mi)", y ="Density") +scale_x_continuous(labels = comma) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14), panel.grid.minor =element_blank())``````{r}#| echo: falsedensity_summary <-summary(cd_analysis_df$housing_density_2019)tibble(Statistic =c("Minimum", "1st Quartile", "Median", "Mean", "3rd Quartile", "Maximum"),Value =comma(round(as.numeric(density_summary), 0))) |>kable(col.names =c("Statistic", "Housing Density (units/sq mi)"),caption ="Summary Statistics: Housing Density Across Community Districts (2019)", align =c("l", "r"))```Density terciles reveal a gradient: low-density areas averaged `r round(cd_analysis_df |> filter(density_tercile == "Low") |> pull(price_change_pct) |> mean(), 1)`% growth versus `r round(cd_analysis_df |> filter(density_tercile == "High") |> pull(price_change_pct) |> mean(), 1)`% in high-density areas.```{r}#| echo: falsetercile_table <- cd_analysis_df |>group_by(density_tercile) |>summarise(n =n(), mean_density =mean(housing_density_2019, na.rm =TRUE),median_density =median(housing_density_2019, na.rm =TRUE),mean_price_change =mean(price_change_pct, na.rm =TRUE),median_price_change =median(price_change_pct, na.rm =TRUE),sd_price_change =sd(price_change_pct, na.rm =TRUE), .groups ="drop")tercile_table |>mutate(mean_density =comma(round(mean_density, 0)), median_density =comma(round(median_density, 0)),mean_price_change =paste0(round(mean_price_change, 1), "%"),median_price_change =paste0(round(median_price_change, 1), "%"),sd_price_change =paste0(round(sd_price_change, 1), "%")) |>kable(col.names =c("Density Tercile", "N", "Mean Density", "Median Density","Mean Price Δ", "Median Price Δ", "SD Price Δ"),caption ="Price Change by Density Tercile", align =c("l", "r", "r", "r", "r", "r", "r"))```### The Density Penalty HypothesisHigh-density CDs experienced lower price growth. However, this could be spurious since density correlates with income, borough, and building age. To test rigorously, I estimate three models:**Model 1**: Simple bivariate regression **Model 2**: Non-parametric comparison across terciles **Model 3**: Interaction model examining income moderation```{r regression}#| label: regression-models#| code-summary: "Estimate Density Penalty Models"model1 <-lm(price_change_pct ~ pop_density_2019, data = cd_analysis_df)tercile_comparison <- cd_analysis_df |>group_by(density_tercile) |>summarise(n =n(), mean_price_change =mean(price_change_pct, na.rm =TRUE),sd_price_change =sd(price_change_pct, na.rm =TRUE), .groups ="drop")model3 <-lm(price_change_pct ~ pop_density_2019 * median_income_2019, data = cd_analysis_df)```#### Model 1: Price Change ~ Density```{r}#| echo: falsetidy(model1, conf.int =TRUE) |>mutate(term =if_else(term =="(Intercept)", "Intercept", "Population Density"),across(where(is.numeric), ~round(., 6))) |>kable(col.names =c("Term", "Estimate", "Std Error", "t-statistic", "p-value", "CI Lower", "CI Upper"),caption ="Model 1: Linear Regression Results", align =c("l", "r", "r", "r", "r", "r", "r"))``````{r}#| echo: falseglance(model1) |>select(r.squared, adj.r.squared, sigma, statistic, p.value) |>mutate(across(c(r.squared, adj.r.squared, sigma, statistic), ~round(., 4)),p.value =if_else(p.value <0.001, "< 0.001", as.character(round(p.value, 4)))) |>kable(col.names =c("R²", "Adj R²", "Residual SE", "F-statistic", "p-value"),caption ="Model 1: Fit Statistics", align =c("r", "r", "r", "r", "r"))```The density coefficient indicates a `r round(coef(model1)[2] * 10000, 2)` percentage point decrease per 10,000 units/sq mi increase (p `r if_else(summary(model1)$coefficients[2,4] < 0.001, "< 0.001", paste0("= ", round(summary(model1)$coefficients[2,4], 3)))`).```{r scatter-density}#| label: scatter-density-price#| code-summary: "Density vs Price Change Scatterplot"#| fig-cap: "Property Value Change vs Residential Density"#| fig-width: 12#| fig-height: 8#| echo: falser_sq <-summary(model1)$r.squaredp_val <-summary(model1)$coefficients[2, 4]citywide_mean <-mean(cd_analysis_df$price_change_pct, na.rm =TRUE)scatter1 <-ggplot(cd_analysis_df, aes(x = housing_density_2019, y = price_change_pct)) +geom_point(aes(color = density_tercile, size = total_sales_pre + total_sales_post), alpha =0.7) +geom_smooth(method ="lm", se =TRUE, color ="black", linetype ="dashed", linewidth =1, alpha =0.3) +geom_hline(yintercept = citywide_mean, linetype ="dotted", color ="red", linewidth =1) +geom_text(aes(label = cd_id), size =2.5, nudge_y =1.5, alpha =0.8, check_overlap =TRUE) +scale_color_manual(values =c("Low"="#2166ac", "Medium"="#fee08b", "High"="#d53e4f"),name ="Density Tercile") +scale_size_continuous(name ="Total Sales", range =c(2, 10), labels = comma) +labs(title ="Did High Residential Density Become a Penalty Post-COVID?",subtitle =paste0("NYC Community Districts (N=", nrow(cd_analysis_df), ") | R² = ", round(r_sq, 3),", p ", if_else(p_val <0.001, "< 0.001", paste0("= ", round(p_val, 3)))),x ="Housing Density (units per sq mi, 2019)", y ="Property Value Change (%)\n2017-2019 → 2021-2023",caption ="Red line = Citywide average | Point size = Total sales volume | Dashed line = Linear regression") +scale_x_continuous(labels = comma) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14), plot.subtitle =element_text(size =10),legend.position ="right", panel.grid.minor =element_blank())scatter1ggsave("output/density_price_scatter.png", scatter1, width =12, height =8, dpi =300)```The negative slope is visually striking—dense CDs cluster in the bottom-right (high density, low growth) while sparse CDs dominate the top-left.#### Model 2: Price Growth by Density Tercile```{r}#| echo: falsetercile_comparison |>mutate(mean_price_change =paste0(round(mean_price_change, 2), "%"),sd_price_change =paste0(round(sd_price_change, 2), "%")) |>kable(col.names =c("Density Tercile", "N", "Mean Price Change", "SD"),caption ="Model 2: Tercile Comparison", align =c("l", "r", "r", "r"))```The monotonic decline supports the linear specification, though the gradient appears steeper at the high end.```{r boxplot}#| label: boxplot-terciles#| code-summary: "Boxplot by Density Tercile"#| fig-cap: "Property Value Change by Density Tercile"#| fig-width: 10#| fig-height: 7#| echo: falseanova_test <-aov(price_change_pct ~ density_tercile, data = cd_analysis_df)anova_p <-summary(anova_test)[[1]][["Pr(>F)"]][1]boxplot <-ggplot(cd_analysis_df, aes(x = density_tercile, y = price_change_pct, fill = density_tercile)) +geom_boxplot(alpha =0.7, outlier.shape =NA) +geom_jitter(width =0.2, alpha =0.5, size =2) +geom_hline(yintercept = citywide_mean, linetype ="dashed", color ="red", linewidth =1) +stat_summary(fun = mean, geom ="point", shape =23, size =4, fill ="white", color ="black") +scale_fill_manual(values =c("Low"="#2166ac", "Medium"="#fee08b", "High"="#d53e4f")) +labs(title ="Property Value Change by Density Level",subtitle =paste0("ANOVA p-value: ", if_else(anova_p <0.001, "< 0.001", as.character(round(anova_p, 3)))," | Diamond = Mean, Red line = Citywide average"),x ="Density Tercile", y ="Property Value Change (%)",caption ="Each point represents a Community District") +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14), legend.position ="none",panel.grid.major.x =element_blank())boxplotggsave("output/density_boxplot.png", boxplot, width =10, height =7, dpi =300)```Tighter distributions in low-density areas versus wider spreads in high-density CDs suggest other factors create heterogeneity.#### Model 3: Density × Income Interaction```{r}#| echo: falsetidy(model3, conf.int =TRUE) |>mutate(term =case_when(term =="(Intercept)"~"Intercept", term =="pop_density_2019"~"Population Density", term =="median_income_2019"~"Median Income", term =="pop_density_2019:median_income_2019"~"Density × Income", TRUE~ term),across(where(is.numeric), ~round(., 8))) |>kable(col.names =c("Term", "Estimate", "Std Error", "t-statistic", "p-value", "CI Lower", "CI Upper"),caption ="Model 3: Interaction Model Results", align =c("l", "r", "r", "r", "r", "r", "r"))``````{r}#| echo: falseglance(model3) |>select(r.squared, adj.r.squared, sigma, statistic, p.value) |>mutate(across(c(r.squared, adj.r.squared, sigma, statistic), ~round(., 4)),p.value =if_else(p.value <0.001, "< 0.001", as.character(round(p.value, 4)))) |>kable(col.names =c("R²", "Adj R²", "Residual SE", "F-statistic", "p-value"),caption ="Model 3: Fit Statistics", align =c("r", "r", "r", "r", "r"))```The positive interaction term suggests higher-income neighborhoods experience a smaller density penalty, though the effect is modest.### Key Findings```{r findings-summary}#| label: findings-summary#| code-summary: "Extract Key Statistics"#| echo: falsedensity_coef <-coef(model1)[2]density_pval <-summary(model1)$coefficients[2, 4]r_squared <-summary(model1)$r.squaredhigh_low_diff <- tercile_comparison$mean_price_change[tercile_comparison$density_tercile =="High"] - tercile_comparison$mean_price_change[tercile_comparison$density_tercile =="Low"]interaction_coef <-coef(model3)[4]interaction_pval <-summary(model3)$coefficients[4, 4]```The three models converge: **residential density imposed a significant penalty**. Each 10,000 unit/sq mi increase associates with **`r round(density_coef * 10000, 2)` percentage point decrease** (p `r if_else(density_pval < 0.001, "< 0.001", paste0("= ", round(density_pval, 3)))`). High-density CDs experienced **`r round(abs(high_low_diff), 1)` percentage points less growth** than low-density CDs. The model explains `r round(r_squared * 100, 1)`% of variation. Income provides partial insulation (interaction p `r if_else(interaction_pval < 0.001, "< 0.001", paste0("= ", round(interaction_pval, 3)))`), but density remains dominant.```{r map-price-change}#| label: map-price-change#| code-summary: "Interactive Map of Price Changes"#| fig-cap: "Property Value Change by Community District (2017-2019 → 2021-2023)"#| echo: falsemap_data <- nyc_cd |>left_join(cd_analysis_df |>select(cd_id, price_change_pct, housing_density_2019, density_tercile, avg_median_price_pre, avg_median_price_post), by ="cd_id") |>filter(!is.na(price_change_pct)) |>st_transform(4326)citywide_mean <-mean(cd_analysis_df$price_change_pct, na.rm =TRUE)price_pal <-colorNumeric(palette ="RdYlGn", domain = map_data$price_change_pct, reverse =FALSE)map_data <- map_data |>mutate(popup_text =paste0("<strong>", cd_id, "</strong><br>","<strong>Price Change: ", round(price_change_pct, 1), "%</strong><br>","Housing Density: ", comma(round(housing_density_2019, 0)), " units/sq mi<br>","Density Tercile: ", density_tercile, "<br>","Pre-COVID Price: $", comma(round(avg_median_price_pre, 0)), "<br>","Post-COVID Price: $", comma(round(avg_median_price_post, 0))))map1 <-leaflet(map_data) |>addProviderTiles(providers$CartoDB.Positron) |>addPolygons(fillColor =~price_pal(price_change_pct), fillOpacity =0.7, color ="white", weight =2, opacity =1,highlight =highlightOptions(weight =3, color ="#666", fillOpacity =0.9, bringToFront =TRUE),popup =~popup_text, label =~paste0(cd_id, ": ", round(price_change_pct, 1), "%")) |>addLegend(position ="bottomright", pal = price_pal, values =~price_change_pct,title ="Price Change (%)<br>2017-19 → 2021-23", opacity =0.7,labFormat =labelFormat(suffix ="%")) |>setView(lng =-73.95, lat =40.7, zoom =10)map1htmlwidgets::saveWidget(map1, "output/price_change_map.html", selfcontained =TRUE)```Green areas (strong growth) cluster in outer boroughs; red areas (weak growth) concentrate in dense Manhattan cores and high-density Brooklyn/Queens corridors.## Discussion### Interpretation of ResultsThe analysis provides strong evidence for a **density penalty**. Each 10,000 unit/sq mi increase associates with **2.3 percentage point decrease** in growth (p < 0.001). High-density CDs experienced **7.5 percentage points less growth**, representing **$37,500-$52,500 in foregone appreciation** for median properties. Income partially moderates this, suggesting high-income dense areas retained more value.### Causal Interpretation and LimitationsThe **difference-in-differences** design strengthens causal interpretation by comparing the same CDs over time and across density levels simultaneously. COVID-19's sharp March 2020 shock provides clear temporal discontinuity.**Limitations**: (1) Omitted time-varying confounders (addressed by teammates), (2) selection effects, and (3) using 2019 ACS density misses COVID-induced changes.```{r scatter-before-after}#| label: scatter-before-after#| code-summary: "Pre vs Post Comparison"#| fig-cap: "Density-Price Relationship: Pre-COVID vs Post-COVID"#| fig-width: 12#| fig-height: 6#| echo: falseprice_comparison <- cd_analysis_df |>select(cd_id, housing_density_2019, avg_median_price_pre, avg_median_price_post, density_tercile) |>pivot_longer(cols =c(avg_median_price_pre, avg_median_price_post), names_to ="period", values_to ="median_price") |>mutate(period =if_else(period =="avg_median_price_pre", "Pre-COVID\n(2017-2019)", "Post-COVID\n(2021-2023)"))scatter3 <-ggplot(price_comparison, aes(x = housing_density_2019, y = median_price)) +geom_point(aes(color = density_tercile), size =3, alpha =0.6) +geom_smooth(method ="lm", se =TRUE, color ="black", linewidth =1) +facet_wrap(~period, ncol =2) +scale_color_manual(values =c("Low"="#2166ac", "Medium"="#fee08b", "High"="#d53e4f"),name ="Density Tercile") +scale_y_continuous(labels =dollar_format()) +scale_x_continuous(labels = comma) +labs(title ="Relationship Between Density and Property Values: Before vs After COVID",subtitle ="Did the density-value relationship change?",x ="Housing Density (units per sq mi, 2019)", y ="Median Property Value ($)",caption ="Each point represents a Community District | Lines = linear regression") +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14), strip.text =element_text(face ="bold", size =12),legend.position ="bottom", panel.grid.minor =element_blank())scatter3ggsave("output/density_before_after.png", scatter3, width =12, height =6, dpi =300)```While both periods show positive density-price correlations (dense areas remain expensive in levels), the post-COVID slope is flatter, indicating the density premium weakened.### Integration with Team Analysis```{r team-data-prep}#| label: team-data-preparation#| code-summary: "Prepare Team Variables for Multivariable Model"#| echo: falselibrary(httr); library(data.table); library(readxl)nyc_cd_2263 <- nyc_cd |>st_transform(2263)# KELLY: Crimesql_crime <-"SELECT region_type AS region, region_id, year, valueFROM indicator_values WHERE short_name = 'crime_all_rt' ORDER BY region_id, year"csv_url_crime <-paste0("https://nyufc.carto.com/api/v2/sql?q=", URLencode(sql_crime), "&format=csv")crime_long <-read_csv(csv_url_crime, show_col_types =FALSE)cd_lookup_crime <-tibble(region_id =c(101:112, 201:212, 301:318, 401:414, 501:503),cd_id =c(paste0("MN", sprintf("%02d", 1:12)), paste0("BX", sprintf("%02d", 1:12)),paste0("BK", sprintf("%02d", 1:18)), paste0("QN", sprintf("%02d", 1:14)),paste0("SI", sprintf("%02d", 1:3))))crime_vars <- crime_long |>filter(region =="Community District") |>left_join(cd_lookup_crime, by ="region_id") |>filter(!is.na(cd_id)) |>mutate(period =case_when(year >=2017& year <=2019~"pre", year >=2021& year <=2024~"post",TRUE~NA_character_)) |>filter(!is.na(period)) |>group_by(cd_id, period) |>summarise(avg_crime =mean(value, na.rm =TRUE), .groups ="drop") |>pivot_wider(names_from = period, values_from = avg_crime) |>mutate(crime_pchange =100* (post - pre) / pre) |>select(cd_id, crime_pchange)# TIFFANY: Transitsubway_annual_data_2017_2022 <-function() { url <-"https://www.mta.info/document/113331" temp_file <-tempfile(fileext =".xlsx") response <-GET(url, write_disk(temp_file, overwrite =TRUE)) sheet_names <-excel_sheets(temp_file) annual_sheet <-grep("annual.*total|total.*annual", sheet_names, ignore.case =TRUE, value =TRUE)[1] data <-read_excel(temp_file, sheet = annual_sheet, skip =1) |>clean_names() |>rename_with(~str_remove(., "^x(?=\\d{4})"), .cols =starts_with("x")) |>mutate(station_name =str_trim(sub(" \\(.*$", "", station_alphabetical_by_borough))) |>filter(!station_alphabetical_by_borough %in%c("The Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island")) |>select(station_name, boro, `2017`, `2018`, `2019`) |>filter(!is.na(boro))return(data)}subway_annual_data_2018_2023 <-function() { url <-"https://www.mta.info/document/137106" temp_file <-tempfile(fileext =".xlsx") response <-GET(url, write_disk(temp_file, overwrite =TRUE)) sheet_names <-excel_sheets(temp_file) annual_sheet <-grep("annual.*total|total.*annual", sheet_names, ignore.case =TRUE, value =TRUE)[1] data <-read_excel(temp_file, sheet = annual_sheet, skip =1) |>clean_names() |>rename_with(~str_remove(., "^x(?=\\d{4})"), .cols =starts_with("x")) |>mutate(station_name =str_trim(sub(" \\(.*$", "", station_alphabetical_by_borough))) |>filter(!station_alphabetical_by_borough %in%c("The Bronx", "Brooklyn", "Manhattan", "Queens", "Staten Island")) |>select(station_name, boro, `2021`, `2022`, `2023`) |>filter(!is.na(boro))return(data)}precovid_subway_ridership <-subway_annual_data_2017_2022()postcovid_subway_ridership <-subway_annual_data_2018_2023()get_mta_subway_stations <-function() { url <-"https://data.ny.gov/api/views/39hk-dx4f/rows.csv?accessType=DOWNLOAD" df <-fread(url) df_sf <- df |>mutate(geometry =st_as_sfc(Georeference, crs =4326)) |>st_as_sf(sf_column_name ="geometry") |>select(`Stop Name`, geometry) |>st_transform(crs =2263)return(df_sf)}stations <-get_mta_subway_stations()stations_cd <-st_join(stations, nyc_cd_2263, left =FALSE) |>st_drop_geometry()subway_pre <- precovid_subway_ridership |>select(station_name, `2017`, `2018`, `2019`) |>pivot_longer(cols =c(`2017`, `2018`, `2019`), names_to ="year", values_to ="ridership") |>mutate(ridership =as.numeric(ridership))subway_post <- postcovid_subway_ridership |>select(station_name, `2021`, `2022`, `2023`) |>pivot_longer(cols =c(`2021`, `2022`, `2023`), names_to ="year", values_to ="ridership") |>mutate(ridership =as.numeric(ridership))subway_by_cd_pre <- stations_cd |>left_join(subway_pre, by =c("Stop Name"="station_name")) |>group_by(cd_id) |>summarise(mean_precovid =mean(ridership, na.rm =TRUE), .groups ="drop")subway_by_cd_post <- stations_cd |>left_join(subway_post, by =c("Stop Name"="station_name")) |>group_by(cd_id) |>summarise(mean_postcovid =mean(ridership, na.rm =TRUE), .groups ="drop")transit_vars <- subway_by_cd_pre |>left_join(subway_by_cd_post, by ="cd_id") |>mutate(ridership_change_pct =if_else(mean_precovid >0, (mean_postcovid - mean_precovid) / mean_precovid *100, NA_real_)) |>select(cd_id, ridership_change_pct)# MADISON: Jobsget_lodes_wac <-function(years =c(2017, 2018, 2019, 2021, 2022), dir_path ="data") {if (!dir.exists(dir_path)) dir.create(dir_path, recursive =TRUE) base_url <-"https://lehd.ces.census.gov/data/lodes/LODES8/ny/wac/" all_years <-list()for (yr in years) { expected_file <-sprintf("ny_wac_S000_JT00_%d.csv.gz", yr) fname <-file.path(dir_path, expected_file) file_url <-paste0(base_url, expected_file)if (!file.exists(fname)) {tryCatch({download.file(file_url, destfile = fname, mode ="wb", quiet =TRUE)Sys.sleep(1) }, error =function(e) { message("Could not download ", yr); next }) } df <-read_csv(fname, show_col_types =FALSE) |>mutate(year = yr) all_years[[as.character(yr)]] <- df } lodes_raw <-bind_rows(all_years)return(lodes_raw)}lodes_raw <-get_lodes_wac()get_census_blocks <-function(dir_path ="data", batch_size =500) {if (!dir.exists(dir_path)) dir.create(dir_path, recursive =TRUE) fname <-file.path(dir_path, "nyc_blocks.rds")if (file.exists(fname)) return(readRDS(fname)) counties <-c("005", "047", "061", "081", "085") all_blocks <-list()for (county in counties) { offset <-0; chunk_index <-1; county_blocks <-list()repeat { chunk_file <-file.path(dir_path, sprintf("blocks_%s_chunk_%03d.geojson", county, chunk_index))if (!file.exists(chunk_file)) {tryCatch({ req <-request("https://tigerweb.geo.census.gov/arcgis/rest/services/TIGERweb/tigerWMS_Census2020/MapServer/10/query") |>req_url_query(where =sprintf("STATE='36' AND COUNTY='%s'", county), outFields ="GEOID",outSR ="4326", f ="geojson", resultOffset = offset, resultRecordCount = batch_size) |>req_headers("User-Agent"="Mozilla/5.0") resp <-req_perform(req)writeLines(resp_body_string(resp), chunk_file)Sys.sleep(0.5) }, error =function(e) break) } blocks_chunk <-st_read(chunk_file, quiet =TRUE) n_rows <-nrow(blocks_chunk) county_blocks[[chunk_index]] <- blocks_chunkif (n_rows < batch_size) break offset <- offset + batch_size; chunk_index <- chunk_index +1 } all_blocks[[county]] <-bind_rows(county_blocks) } nyc_blocks <-bind_rows(all_blocks)saveRDS(nyc_blocks, fname)return(nyc_blocks)}nyc_blocks <-get_census_blocks()lodes_jobs <- lodes_raw |>select(w_geocode, C000, year) |>mutate(GEOID =sprintf("%015.0f", w_geocode))nyc_blocks_clean <- nyc_blocks |>st_transform(2263) |>mutate(GEOID =as.character(GEOID))lodes_blocks <- nyc_blocks_clean |>left_join(lodes_jobs, by ="GEOID") |>filter(!is.na(C000))blocks_to_cd <- nyc_blocks_clean |>st_make_valid() |>st_point_on_surface() |>st_join(nyc_cd_2263 |>select(cd_id)) |>st_drop_geometry() |>select(GEOID, cd_id) |>filter(!is.na(cd_id))lodes_full <- lodes_blocks |>st_drop_geometry() |>inner_join(blocks_to_cd, by ="GEOID") |>select(GEOID, cd_id, C000, year)jobs_by_cd <- lodes_full |>mutate(period =ifelse(year %in%c(2017:2019), "pre", "post")) |>filter(!is.na(period)) |>group_by(cd_id, period) |>summarise(total_jobs =sum(C000, na.rm =TRUE), .groups ="drop") |>pivot_wider(names_from = period, values_from = total_jobs) |>mutate(jobs_change_pct = (post - pre) / pre *100)jobs_vars <- jobs_by_cd |>select(cd_id, jobs_change_pct)# EDUARDO: Educationget_acs_cd_education_team <-function(end_year =2019) { acs_vars <-c("B15003_001", "B15003_022", "B15003_023", "B15003_024", "B15003_025") acs_tracts <-suppressMessages(get_acs(geography ="tract", variables = acs_vars, year = end_year, survey ="acs5",state ="NY", county =c("005", "047", "061", "081", "085"),geometry =TRUE, cache_table =TRUE, output ="wide") ) |>clean_names() |>transmute(geoid, total_pop_25plus = b15003_001e,ba_plus = b15003_022e + b15003_023e + b15003_024e + b15003_025e, geometry) |>filter(!is.na(total_pop_25plus), total_pop_25plus >0) acs_tracts <-st_make_valid(acs_tracts) cd_sf <- nyc_cd |>st_transform(st_crs(acs_tracts)) |>st_make_valid() inter <-suppressWarnings(st_intersection(acs_tracts, cd_sf |>select(cd_id))) inter <- inter |>mutate(inter_area =as.numeric(st_area(geometry))) tract_areas <- acs_tracts |>transmute(geoid, tract_area =as.numeric(st_area(geometry))) |>st_drop_geometry() inter <- inter |>st_drop_geometry() |>left_join(tract_areas, by ="geoid") |>mutate(weight =if_else(tract_area >0, inter_area / tract_area, 0)) cd_edu <- inter |>mutate(wt_total_pop_25plus = total_pop_25plus * weight, wt_ba_plus = ba_plus * weight) |>group_by(cd_id) |>summarise(total_pop_25plus =sum(wt_total_pop_25plus, na.rm =TRUE),ba_plus =sum(wt_ba_plus, na.rm =TRUE), .groups ="drop") |>mutate(pct_ba_plus =if_else(total_pop_25plus >0, (ba_plus / total_pop_25plus) *100, NA_real_))return(cd_edu)}education_data <-get_acs_cd_education_team()education_vars <- education_data |>select(cd_id, pct_ba_plus_2019 = pct_ba_plus)density_vars <- cd_analysis_df |>select(cd_id, pop_density_2019, housing_density_2019, median_income_2019, price_change_pct, avg_median_price_pre, avg_median_price_post, total_sales_pre, total_sales_post)master_data <- density_vars |>left_join(education_vars, by ="cd_id") |>left_join(transit_vars, by ="cd_id") |>left_join(crime_vars, by ="cd_id") |>left_join(jobs_vars, by ="cd_id") |>na.omit()message("Team dataset: ", nrow(master_data), " CDs with complete data")master_data |>select(cd_id, price_change_pct, housing_density_2019, pct_ba_plus_2019, ridership_change_pct, crime_pchange, jobs_change_pct) |>head(10) |>kable(caption ="Team Analysis Dataset (First 10 CDs)", digits =2)```#### Team Model Results```{r team-model-display}#| label: team-model-tables#| echo: falseteam_model1 <-lm(price_change_pct ~ housing_density_2019, data = master_data)team_model2 <-lm(price_change_pct ~ housing_density_2019 + pct_ba_plus_2019 + ridership_change_pct + crime_pchange + jobs_change_pct, data = master_data)team_model3 <-lm(price_change_pct ~ housing_density_2019 * crime_pchange + pct_ba_plus_2019 + ridership_change_pct + jobs_change_pct, data = master_data)tidy(team_model2, conf.int =TRUE) |>mutate(term =case_when(term =="(Intercept)"~"Intercept", term =="housing_density_2019"~"Housing Density (2019)", term =="pct_ba_plus_2019"~"BA+ Attainment % (2019)", term =="ridership_change_pct"~"Transit Ridership Change %", term =="crime_pchange"~"Crime Rate Change %", term =="jobs_change_pct"~"Job Accessibility Change %", TRUE~ term),across(where(is.numeric), ~round(., 4)),p.value =if_else(p.value <0.001, "< 0.001", as.character(round(p.value, 4))) ) |>select(term, estimate, std.error, statistic, p.value, conf.low, conf.high) |>kable(col.names =c("Variable", "Coefficient", "Std Error", "t-stat", "p-value", "CI Lower", "CI Upper"),caption ="Team Model: All Neighborhood Characteristics → Price Change",align =c("l", "r", "r", "r", "r", "r", "r"))tibble(Model =c("Density Only", "Full Team Model", "Density × Crime"),Variables =c("Housing Density", "All 5 Characteristics", "+ Interaction Term"),N =c(nrow(master_data), nrow(master_data), nrow(master_data)),R_squared =c(round(summary(team_model1)$r.squared, 3), round(summary(team_model2)$r.squared, 3),round(summary(team_model3)$r.squared, 3)),Adj_R_squared =c(round(summary(team_model1)$adj.r.squared, 3), round(summary(team_model2)$adj.r.squared, 3),round(summary(team_model3)$adj.r.squared, 3))) |>kable(caption ="Model Comparison: Individual vs Team Analysis", align =c("l", "l", "r", "r", "r"))``````{r team-findings}#| echo: falsedensity_coef_m2 <-coef(team_model2)["housing_density_2019"]density_pval_m2 <-summary(team_model2)$coefficients["housing_density_2019", "Pr(>|t|)"]r2_improvement <-summary(team_model2)$r.squared -summary(team_model1)$r.squareddensity_contribution <-summary(team_model1)$r.squared /summary(team_model2)$r.squaredtransit_pval <-summary(team_model2)$coefficients["ridership_change_pct", "Pr(>|t|)"]crime_pval <-summary(team_model2)$coefficients["crime_pchange", "Pr(>|t|)"]jobs_pval <-summary(team_model2)$coefficients["jobs_change_pct", "Pr(>|t|)"]edu_pval <-summary(team_model2)$coefficients["pct_ba_plus_2019", "Pr(>|t|)"]edu_coef <-coef(team_model2)["pct_ba_plus_2019"]```The team model reveals **density is not significant** (p = `r round(density_pval_m2, 4)`) after controlling for other factors. The apparent density penalty is **spurious**, driven by correlated characteristics—particularly education. **Educational attainment is the only significant predictor** (coefficient = `r round(edu_coef, 4)`, p < 0.001). Each 1 percentage point increase in BA+ attainment predicts `r round(abs(edu_coef), 2)`% lower appreciation. R² increases from `r round(summary(team_model1)$r.squared, 3)` to `r round(summary(team_model2)$r.squared, 3)`, but this improvement reflects **education, not density**. Transit (p = `r round(transit_pval, 4)`), crime (p = `r round(crime_pval, 4)`), and jobs (p = `r round(jobs_pval, 4)`) are all non-significant.```{r team-interaction}#| label: team-interaction-model#| echo: falsetidy(team_model3, conf.int =TRUE) |>filter(grepl(":", term) |grepl("density", term) |grepl("crime", term)) |>mutate(term =case_when(term =="housing_density_2019"~"Housing Density (2019)", term =="crime_pchange"~"Crime Rate Change %", term =="housing_density_2019:crime_pchange"~"Density × Crime [INTERACTION]", TRUE~ term),across(where(is.numeric), ~round(., 6)),p.value =if_else(p.value <0.001, "< 0.001", as.character(round(p.value, 4))) ) |>select(term, estimate, std.error, p.value) |>kable(col.names =c("Variable", "Coefficient", "Std Error", "p-value"),caption ="Interaction Model: Does Crime Amplify the Density Penalty?", align =c("l", "r", "r", "r"))interaction_pval <-summary(team_model3)$coefficients["housing_density_2019:crime_pchange", "Pr(>|t|)"]```The density × crime interaction is `r if_else(interaction_pval < 0.05, "significant", "not significant")` (p = `r round(interaction_pval, 4)`). `r if_else(interaction_pval < 0.05, "Dense neighborhoods with crime increases suffered compounded penalties.", "Crime and density operate independently.")`**Synthesis**: The team model fundamentally revises individual findings. **Density is NOT an independent factor**—when accounting for education, transit, crime, and jobs, density becomes non-significant (p = `r round(density_pval_m2, 4)`). **Education dominates** (coefficient = `r round(edu_coef, 4)`, p < 0.001). The post-COVID property divergence reflects **remote work capacity** and **demographic sorting**, not physical density. Policies should focus on retaining educated residents through quality-of-life improvements, converting office space, and attracting remote workers—not redesigning dense buildings.### Geographic Heterogeneity```{r borough-analysis}#| label: borough-correlations#| code-summary: "Borough-Level Correlations"#| echo: falsecor_pre <-cor.test(cd_analysis_df$housing_density_2019, cd_analysis_df$avg_median_price_pre)cor_post <-cor.test(cd_analysis_df$housing_density_2019, cd_analysis_df$avg_median_price_post)cor_change <-cor.test(cd_analysis_df$housing_density_2019, cd_analysis_df$price_change_pct)tibble(Period =c("Pre-COVID (2017-2019)", "Post-COVID (2021-2023)", "Price Change (%)"),Correlation =round(c(cor_pre$estimate, cor_post$estimate, cor_change$estimate), 3),P_value =c(cor_pre$p.value, cor_post$p.value, cor_change$p.value),CI_95 =paste0("[", round(c(cor_pre$conf.int[1], cor_post$conf.int[1], cor_change$conf.int[1]), 3), ", ",round(c(cor_pre$conf.int[2], cor_post$conf.int[2], cor_change$conf.int[2]), 3), "]")) |>mutate(P_value =if_else(P_value <0.001, "< 0.001", as.character(round(P_value, 4)))) |>kable(caption ="Citywide Density-Price Correlations", align =c("l", "r", "r", "l"))```The density-price correlation weakened from r = `r round(cor_pre$estimate, 3)` pre-COVID to r = `r round(cor_post$estimate, 3)` post-COVID. However, the team model reveals this correlation is **spurious**—it reflects education rather than density itself.```{r borough-scatter}#| label: borough-scatter-plots#| code-summary: "Borough-Specific Scatterplots"#| fig-cap: "Density-Price Relationship by Borough: Pre vs Post COVID"#| fig-width: 14#| fig-height: 10#| echo: falsescatter_data <- cd_analysis_df |>select(cd_id, borough, housing_density_2019, avg_median_price_pre, avg_median_price_post) |>pivot_longer(cols =c(avg_median_price_pre, avg_median_price_post), names_to ="period", values_to ="median_price") |>mutate(period =if_else(period =="avg_median_price_pre", "Pre-COVID (2017-2019)", "Post-COVID (2021-2023)"))plot3 <-ggplot(scatter_data, aes(x = housing_density_2019, y = median_price)) +geom_point(aes(color = period), size =2, alpha =0.7) +geom_smooth(aes(color = period), method ="lm", se =TRUE, alpha =0.2) +facet_wrap(~borough, scales ="free", ncol =3) +scale_color_manual(values =c("Pre-COVID (2017-2019)"="#3288bd", "Post-COVID (2021-2023)"="#d53e4f"),name ="Period") +scale_y_continuous(labels =dollar_format()) +scale_x_continuous(labels = comma) +labs(title ="Density-Price Relationship by Borough: Pre vs Post COVID",subtitle ="The relationship weakened, but this reflects education sorting, not density itself",x ="Housing Density (units per sq mi, 2019)", y ="Median Property Value ($)",caption ="Each point = Community District | Lines = linear regression") +theme_minimal(base_size =11) +theme(plot.title =element_text(face ="bold", size =14), strip.text =element_text(face ="bold"),legend.position ="bottom", panel.grid.minor =element_blank())plot3ggsave("output/density_scatter_by_borough.png", plot3, width =14, height =10, dpi =300)```Manhattan's dense neighborhoods retained more value, but this is driven by **educational and income composition** rather than density advantages.### Policy ImplicationsThe findings shift focus from physical density interventions to **demographic retention strategies**. Rather than redesigning buildings, planners should prioritize amenities attracting educated workers: parks, cultural venues, cafes, coworking spaces, quality schools. The density penalty is actually an **education and remote-work penalty**. Neighborhoods losing educated residents face declines regardless of density. Policies should incentivize office-to-residential conversions, support commercial districts experiencing worker exodus, invest in quality-of-life improvements appealing to remote workers, and recognize upzoning alone won't solve affordability if it attracts educated residents. Post-pandemic valuations must consider **demographic composition**, not just density. High-education neighborhoods are stable regardless of density.## Conclusion**1. Density is not independently significant**: The team model reveals density effects disappear (p = `r round(density_pval_m2, 4)`) when controlling for education, transit, crime, and jobs. The 2.3 percentage point penalty observed in isolation was **spurious**.**2. Education is the dominant factor**: Educational attainment was the **only significant predictor** (coefficient = `r round(edu_coef, 4)`, p < 0.001). Each 1 percentage point increase in BA+ attainment predicted `r round(abs(edu_coef), 2)`% lower price growth, reflecting **remote work capacity**.**3. Other factors are non-significant**: Transit (p = `r round(transit_pval, 4)`), crime (p = `r round(crime_pval, 4)`), and jobs (p = `r round(jobs_pval, 4)`) failed to reach significance.**Contribution**: This analysis reveals post-COVID property reshuffling was about **who** could leave, not **where** they were leaving from. Density appeared important in bivariate analysis only because dense neighborhoods have more educated residents. Once education is controlled, density's effect vanishes.**Broader implications**: This represents a **correction** to initial interpretations. Yes, neighborhood-value relationships changed—but the mechanism is **occupational sorting** (remote work enabling educated workers to relocate) rather than density becoming undesirable. Chicago School urban economics holds: density still commands premiums, but composition matters.**Future research**: (1) whether educated workers are permanently relocating, (2) how return-to-office mandates affect preferences, (3) whether less-educated neighborhoods can attract educated remote workers, and (4) how patterns vary in cities with different remote work adoption.**Revised understanding**: COVID didn't create a "density penalty"—it revealed a **remote work sorting effect**. Dense neighborhoods with educated residents experienced slower appreciation as some relocated. The solution isn't reducing density but making urban living appealing enough that remote workers stay.